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THE  PROBLEM  OF  EXPERIMENTAL  DESIGN  IN  SIMULATION 


A  computer  simulation,  which  is  a  mathematical  model  of  a  system  in 
the  form  of  a  computer  program,  may  be  viewed  as  a  "black  box"  in  which 
input  factors  (independent  variables)  are  combined  to  produce  an  output 
or  response  (dependent  variable).  The  simulation  usually  is  used  to  con¬ 
duct  an  experimental  study  of  the  modelled  system.  Since  simulation  runs 
often  are  very  expensive,  the  simulation  user  may  wish  to  concentrate  on 
only  the  most  important  factors,  that  is,  those  having  a  strong  effect  on 
the  output.  However,  because  standard  experimental  designs  found  In  the 
statistical  literature  often  require  more  simulation  runs  than  are  avail¬ 
able  to  the  simulation  user,  the  identification  of  these  factors  by  means 
of  statistically  designed  experiments  can  pose  special  design  and  analysis 
problems. 

In  general,  therefore,  the  primary  difficulty  of  experimental  design 
in  simulation  can  be  succinctly  summarized  as  too  many  factors  and  too  few 
runs.  Because  of  this,  it  is  impossible  to  investigate  thoroughly  all 
factors  under  consideration.  What  is  required,  then,  is  some  means  of 
making  the  available  number  of  computer  runs  and  the  number  of  factors 
compatible.  Assuming  that  time  and/or  budget  limitations  prohibit  addi¬ 
tional  computer  runs,  there  is  a  need  for  conciliatory  alternatives  that 
can  be  feasibly  implemented. 

A  general  discussion  of  this  experimental  design  problem  is  presented 
in  this  report.  Three  possible  two-stage  strategies  for  attacking  the 
problem  are  considered,  and  performance  measures  with  which  to  judge  the 


strategies  are  described.  Each  strategy  consists  of  a  first  stage  which 
uses  a  nonstandard  approach  to  identify  a  relatively  small  factor  subset 
for  further  consideration.  The  second  stage  examines  this  subset  by  means 
of  a  standard  experimental  design  in  an  attempt  to  eliminate  any  unimportant 
factors  which  were  unknowingly  included  in  the  subset.  Because  these 
strategies  are  designed  for  "screening"  the  factors,  they  are  known  as 
factor  screening  approaches. 


i 

A.  DISCUSSION  1 


In  seme  cases  expert  judgment,  based  on  simulation  of  similar  systems  j 

i 

i 

or  on  consideration  of  the  processes  being  simulated,  can  be  used  to  select  ! 

the  subset  of  factors  for  follow-up  experimentation.  For  example,  because 
of  previous  experience,  the  user  of  a  given  simulation  may  be  quite  certain 

| 

that  specific  factors  will  have  little  or  no  effect  on  the  response  when 
compared  with  the  rest  of  the  factors.  In  this  situation,  then,  these  factors 
could  be  eliminated  from  the  investigation  by  keeping  them  fixed  at  constant 
values  throughout  subsequent  experimentation.  The  remaining  factors  would 
comprise  the  subset  to  be  analysed  in  the  second  stage. 

On  the  other  hand,  instead  of  selecting  factors  according  to  expectations 
without  expending  any  computer  runs,  it  may  prove  of  value  to  invest  a  portion 

i 

of  the  available  computer  runs  in  a  preliminary  first-stage  screening  experi¬ 
ment.  Of  necessity,  such  a  preliminary  experiment  would,  as  a  rule,  involve 
considerably  less  computer  runs  than  factors,  thus  giving  rise  to  confounded 
estimates,  that  is,  estimates  which  are  "mixed  together"  and  impossible  to 
separate  by  statistical  analysis.  Confounded  estimates  provide  ambiguous 
results  which  may,  if  Interpreted  Incorrectly,  lead  to  completely  ! 
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erroneous  conclusions  about  which  factors  are  important. 

Because  unconfounded  estimates  of  the  effects  of  K  factors  cannot  be 
obtained  without  a  minimum  of  K  +  1  computer  runs,  the  confounding  problem 
may  present  severe  drawbacks  to  the  usefulness  of  any  preliminary  screening 
experiment.  Nonetheless,  they  may  prove  of  enough  value  to  be  used  instead 
of  (or,  possibly,  in  conjunction  with)  expert  judgment. 


B.  A  FACTOR  SCREENING  MODEL 


In  screening,  a  small  number  of  factor  levels  Is  generally  employed; 
usually  two  are  sufficient.  Suppose,  then,  that  a  simulation  consists  of 
K  factors,  each  of  which  is  at  two  levels,  arbitrarily  designated  "high" 
and  "low." 

The  actual  functional  or  statistical  relationship  between  the  simu¬ 
lation  response  and  the  factors  of  a  simulation  model  will,  of  course,  vary 
from  model  to  model.  However,  in  devising  factor  screening  strategies  for 
use  in  computer  simulation  experiments,  it  is  desirable  to  define  a  common 
statistical  model  to  serve  as  a  basis  in  which  to  compare  and  to  assess  any 
screening  strategies  that  might  be  proposed.  To  that  end,  the  following 
paragraphs  summarize  a  reasonable  and  generally  adequate  screening  model 
that  will  be  assumed  to  underlie  the  simulation  responses. 


Define 


+1,  if  factor  j  is  at  its  "high"  level  for  the  i— 
computer  run 

-1,  if  factor  j  is  at  its  "low"  level  for  the  i^- 


computer  run 
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and  leC  y,_  denote  the  simulation  response  for  the  i~  computer  run.  The 
factor  screening  model  assumes  that 


yi  •  80  +  £  Vu  +  V 


where  0^  is  the  (linear)  effect  of  factor  j  and  the  error  terms,  e^,  are 

independent  and  normally  distributed  random  variables  having  a  sero  mean 
2 

and  variance  o  .  In  essence,  this  model  may  be  regarded  as  a  first-order 
Taylor  series  approximation  to  the  actual  relationship  between  the  y^'s 
and  the  x^  * s . 

In  terms  of  the  model,  factor  j  will  be  termed  active  if  end  only  if 
0  i  0,  and  inactive  if  and  only  if  $  ■  0.  Furthermore,  under  the  adopted 

J  J 

parameterization,  6^  can  be  interoreted  as  the  average  difference  between 
the  true  simulation  responses  of  the  high  and  of  the  low  levels  of  the  j  ^ 
factor.  Hence  >  0  only  if  the  factor  level  producing  the  larger  true 
response  is  labeled  as  the  high  (+1)  level.  It  is  assumed  that  only  a 
relatively  small  number,  k,  of  the  K  factors  are  active. 

Under  thla  nomenclature,  the  basic  aim  of  any  screening  procedure  is 
to  efficiently  and  effectively  classify,  as  active  or  as  inactive,  the  K 
factors  under  investigation. 
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II.  THREE  FACTOR  SCREENING  APPROAHCES 


This  paper  considers  three  possible  factor  screening  approaches.  Each 
approach  is  a  two-stage  strategy  which  combines  a  nonstandard  first  stage 
procedure  with  a  second  stage  that  employs  a  standard  experimental  design 
known  as  a  Plackett-Burman  design.  [See  Plackett  and  Burman  (1946).]  This 
design  is  a  two-level  orthogonal  design  for  studying  up  to  4m-l  factors  in 
4m  runs.  Because  of  the  orthogonality,  there  is  no  confounding  (i.e.,  mixing 
together)  of  factor  effects  in  the  second  stage. 

A.  EXPERT  JUDGMENT 


The  first  approach  assumes  that  the  analyst  (i.e.,  the  simulation  user) 
feels  he  or  she  can  do  a  good  job  of  deciding  which  factors  are  active  and 
which  are  inactive.  Thus,  the  analyst  will,  using  expert  judgment,  select 
those  factors  to  be  carried  over  into  the  second  stage.  Assume,  for  sake 
of  analysis  simplicity,  that 

(1)  P (Analyst  identifies  a  factor  as  active | the  factor  is  active)  -  r^ 
and  (2)  P(Analyst  identifies  a  factor  as  inactive|the  factor  is  inactive)  »  r^. 

Of  course,  if  r^  *  r^  ■  1.0.  the  analyst's  judgment  is  perfect.  However, 
as  the  probabilities  r^  and  r^  decrease  from  1.0,  the  effectiveness  of  this 
method  also  decreases.  Although  the  second  stage  Plackett-Burman  design 
applied  to  factors  selected  in  the  first  stage  helps  guard  against  the  mis- 
classification  of  inactive  factors,  any  active  factor  not  selected  by  the 
analyst  in  the  first  stage  will  never  be  classified  correctly. 
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B.  GROUP  SCREENING 


Group  screening  has  been  discussed  in  a  number  of  papers  [e.g,,  Watson 
(1961),  Li  (1962),  Mauro  and  Smith  (1980)].  In  group  screening,  "group- 
factors"  are  created  by  partitioning  the  individual  factors  into  a  number 
of  groups.  The  two-stage  group  screening  procedure  considered  here  relies 
on  a  Placket t-Buraan  design  to  test  for  significant  group-factor  affects  in 
the  initial  stage.  However,  this  design  is  used  in  a  nonstandard  manner 
since  all  factors  in  a  given  group  appear  at  the  same  level  during  a  simula¬ 
tion  run. 

For  example,  suppose  that  the  m  factors  Xp..*.*,,,  form  one  group- 
factor.  Then,  whenever  this  particular  group-factor  appears  at  its  high 

(+1)  level  in  the  Plackett-Burman  design,  all  component  factors  XpX2 . xJn 

would  be  at  their  high  levels.  Thus,  the  effects  of  x1,X2,...,xjn  are  com¬ 
pletely  confounded  so  that  if  the  group  factor  is  found  to  have  an  effect, 

it  cannot  be  determined  which  of  the  factors  x, ,xn,...,x  or  how  many  c” 

I  i  m 

have  an  effect.  The  second  stage  Plackett-Burman  follow-up,  therefore, 
helps  to  resolve  this  question  by  examining  all  individual  factors  compris¬ 
ing  the  group  factors  judged  significant  in  the  first  stage. 

The  first  stage  experiment  requires  N  runs,  where  N  is  the  smallest 
integer  which  is  a  multiple  of  four  and  also  greater  than  the  number  of 
group  factors.  Furthermore,  unlike  the  expert  judgment  approach,  group 
screening  examines  all  of  the  original  K  factors  experimentally;  none  are 
excluded  from  experimentation  in  the  first  stage.  However,  the  possibility 
of  cancellation  of  effects  within  a  group  factor  exists.  That  is,  individual 
factors  could  possibly  have  offsetting  positive  and  negative  effects.  In 


such  a  cat*,  these  factors  would  not  be  brought  over  Into  the  second  stage 
and  would  therefore  be  mlsclasalfled  as  Inactive. 

Because  of  this  possibility,  the  definition  of  high  end  low  factor 
levels  should  be  made  so  that  all  factor  effects  are  anticipated  to  have 
the  some  direction,  e.g.,  to  all  be  positive.  If  all  the  effects  have 
the  same  direction,  cancellation  is  impossible.  Mauro  and  Smith  (1980) 
have  examined,  in  the  case  a  ■  0,  the  performance  of  group  screening  when 
some  effect  directions  are  incorrectly  assumed. 

C.  RANDOM  BALANCE 


In  the  random  balance  approach,  all  K  original  factors  are  included 
in  a  first  stage  experiment  of  N  runs.  Because  of  the  constraints  on  the 
number  of  runs,  N<K.  Subject  to  this  restriction,  the  value  of  N  can  be 
whatever  the  analyst  chooses,  except  that  it  should  be  an  even  number. 

In  the  initial  experiment,  each  factor  appears  at  its  high  level  N/2 
times  and  at  its  low  level  N/2  times  during  the  N  runs,  with  the  order  of 
high  and  low  levels  selected  at  random.  Although  this  guarantees  that  the 
factor  effects  are  unconfounded  with  the  overall  mean  effect,  they  are 
confounded  with  each  other.  Furthermore,  the  confounding  is  random.  In 
addition,  no  standard  analysis  techniques  for  random  balance  data  exist, 
although  a  number  have  been  suggested. 

However,  proponents  of  random  balance  [e.g. ,  Satterthwaite  (1959)  and 
Budne  (1959)]  have  emphasised  that,  in  general,  the  degree  of  confounding 
is  relatively  small  and  analysis  poses  no  great  problem.  Nonetheless, 
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random  balance  haa  received  a  very  bad  name  in  Che  etatisCical  community, 
mainly  becauae  of  the  random  confounding  of  factor  ef facta.  Although  the 
objections  are  baaed  on  good  statistical  reasoning,  no  empirical  evidence 
is  available  to  support  either  proponents  or  opponents  of  random  balance. 
Mauro  and  Smith  (1981)  are  currently  investigating  the  performance  of  ran- 
dom  balance  when  a  standard  one-factor  analysis  of  variance  F-test  is  used 
as  the  method  of  analysis.  The  second  stage  Plackett-Burman  design  includes 
all  factors  judged  significant  in  the  random  balance  experiment. 


Ill,  PERFORMANCE  MEASURES 

In  attempting  to  identify  tha  Important  factors  for  detailed  Investi¬ 
gation,  there  are  the  two  conflicting  requirements  of  factor  misclasslflca- 
tion  and  expenditure  of  runs.  Before  different  factor  screening  approaches 
may  be  compared,  these  requirements  must  be  quantified.  In  assessing  per¬ 
formance,  Smith  and  Mauro  (1980)  considered  the  values  of  expected  loss 
and  expected  relative  testing  cost. 

In  order  to  measure  the  severity  of  classification  error,  consider  the 
class  of  loss  functions  given  by 

K  K 
L  ■  Ew,6./Ew, 
x  J  1  1 

where 

0,  if  the  3^-  factor  is  correctly  identified 

d.  - 

J  1,  if  the  J—  factor  is  incorrectly  identified, 

and  w  denotes  the  loss  incurred  (w  >  0)  if  the  j—  factor  is  misclassif led. 
J  J 

Note  that  L  is  a  function  of  0^,,..,3g  and  lies  in  the  interval  [0,1]. 

For  the  particular  case  in  which 

A,  if  factor  j  is  active 
J'  0,  if  factor  J  is  inactive, 

k 

it  is  reasonable  to  let 

l/2k,  if  factor  j  is  active 
l/2(K-k),  if  factor  j  is  inactive, 


since  this  apportions  one-half  of  the  overall  maximum  loss  to  the  active 
factors  and  the  other  half  to  the  inactive  factors.  Hence,  in  this  case 
the  loss  L  reduces  to 

L  -  [ (K-k) (k-A)  +  k(K-k-I) ] /2k(K-k) 

where  A  denotes  the  number  of  active  factors  correctly  identified  and  I 
denotes  the  number  of  inactive  factors  correctly  identified. 

The  second  performance  measure  discussed  by  Smith  and  Mauro  (1980) 
takes  into  account  the  total  number  of  runs,  R,  that  a  factor  screening 
approach  requires.  The  testing  cost  may  be  defined  relative  to  the  number 
of  runs  required  for  a  Plackett-Burman  design  applied  to  all  K  original 
factors.  Thus,  the  relative  testing  cost  Q  is  given  by 
Q  “  $(R) /♦(**) 

where  <p(M)  represents  the  expense  of  conducting  M  runs,  and  K*  denotes  the 
number  of  runs  required  by  a  Plackett-Burman  design  for  K  factors.  If  <J>(M) 
is  assumed  proportional  to  M,  then 
Q  -  R/K*. 

It  should  be  noted  that  in  most  screening  strategies  both  L  and  Q  are 
random  variables.  Thus,  in  assessing  the  performance  of  a  factor  screening 
approach,  it  is  reasonable  to  examine  their  expected  values. 

Both  expected  loss  and  expected  relative  testing  cost  must  be  jointly 
considered  in  evaluating  the  overall  performance  of  a  factor  screening  strategy. 
In  some  sense  the  problem  is  akin  to  the  testing  of  a  statistical  hypothesis 
in  which  the  probabilities  of  Type  I  error  (rejecting  a  true  null  hypothesis) 
and  Type  II  error  (accepting  a  false  null  hypothesis)  are  both  desired  small, 
but  are  inversely  related. 


The  simulation  user  may  wish  to  specify  joint  values  of  expected  loss 
and  expected  relative  testing  cost  that  are  acceptable.  For  example,  the 
user  may  place  an  upper  limit  on  expected  loss  and  then,  subject  to  this 
constraint,  select  the  screening  approach  having  the  minimum  relative  test¬ 
ing  cost. 

Only  if  one  screening  strategy  has  both  a  smaller  expected  loss,  E(L), 
and  expected  relative  testing  cost,  E(Q),  than  another  strategy  can  the 
first  be  said  to  be  definitely  better  than  the  second.  Otherwise,  the 
decision  depends  upon  the  analyst's  trade-offs.  For  example,  by  looking  at 
Figure  1,  it  is  clear  that  all  analysts  would  select  strategy  A  over  either 
strategy  B  or  D.  However,  one  analyst  might  prefer  A  over  E  because  of  the 
smaller  E(Q)  while  another  might  prefer  E  over  A  because  of  the  smaller  E(L). 
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IV.  SOKE  PRELIMINARY  RESULTS 


Ongoing  research  by  Desmatics,  Inc.  is  examining  the  performance 
measures  E(L)  and  E(Q)  for  the  situation  where  all  active  factors  are 
such  that  |8j|  -  A,  <J>  (M)  is  proportional  to  M,  and  the  incurred  loss  w^ 

is  as  defined  in  the  previous  section.  Within  this  framework,  the  fol¬ 

lowing  cases  are  being  considered: 

)  • 

K  -  60,  120,  240 

k  -  p*K  <p*  -  2/60,  3/60,  5/60,  8/60) 

a  ■  rA  (r«0,  r>0) 

Research  to  date  has  considered  only  the  deterministic  case  (i.e.,  r»0). 

Future  research  will  address  the  case  where  random  error  is  present. 

Figure  2  exhibits  results  for  the  specific  case  K  »  120,  k  *  10,  and 
0*0.  In  the  deterministic  situation,  E(A),  the  expected  number  of  active 
factor'  identified,  is  equal  to  k[l-2E(L)]  for  the  three  approaches  con¬ 
sidered  in  this  report.  Thus,  both  E(L)  and  E(A)  are  presented  in  the  figure. 

As  will  be  noted,  there  are  a  number  of  points  corresponding  to  each 
of  the  three  strategies.  For  the  expert  judgment  strategy,  performance  de¬ 
pends  on  the  values  of  the  probabilities  r^  and  r The  figure  gives  results 
for  various  values  of  r^  ■  r^.  For  group  screening,  performance  depends  on  g, 
group  size,  and  on  i,  the  number  of  misspecified  factor  effect  directions. 

The  figure  provides  results  for  g  *  3,  5,  and  8  and  i  *  0,  1,  2,  3,  4,  5.  For 
random  balance,  performance  depends  an  c,  where  c  -  N/K  and  on  <z»  the  signifi¬ 
cance  level  for  the  F-test  used  in  analyzing  the  first-stage  data.  The  results 
in  Figure  2  correspond  to  various  values  of  c  and  o  in  the  ranges  .2  <  c  <  .8 
and  .10  <  o  <  . 50 . 
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